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Estimation of AR and ARMA models by 
stochastic complexity 

Ciprian Doru GiurcaneanJl^ and Jorma RissanerffltH] 

Tampere University of Technology, and Technical University of Tampere and Helsinki, and 
Helsinki Institute for Information Technology 

Abstract: In this paper the stochastic complexity criterion is applied to es- 
timation of the order in AR, and ARMA models. The power of the criterion 
for short strings is illustrated by simulations. It requires an integral of the 
square root of Fisher information, which is done by Monte Carlo technique. 
The stochastic complexity, which is the negative logarithm of the Normalized 
Maximum Likelihood universal density function, is given. Also, exact asymp- 
totic formulas for the Fisher information matrix are derived. 



1. Introduction 

The negative logarithm of the NML (Normalized Maximum Likelihood) universal 
model, called the stochastic complexity, provides a powerful criterion for estimation 
of the model structure such as the optimal collection of the regressor variables in 
the linear quadratic regression problem, , especially for small amounts of data. 
It involves the integral of the square root of the Fisher information, which is easy 
to calculate when the regressor matrix does not depend on the parameters. While 
modeling gaussian time series with AR models are instances of linear quadratic 
regression problems their order estimation poses trouble with the stochastic com- 
plexity for the reason that the regressor matrix is determined by the parameters, 
and the Fisher information is not constant. The same problem of course is also 
with the ARMA models, which have the additional difficulty of calculation of the 
maximum likelihood parameters. 

In this paper we resort to Monte Carlo integration to overcome the problem 
posed by the nonconstant Fisher information and study by simulations the efficiency 
of the resulting order estimation criterion. Although exact formulas exist for the 
Fisher information matrix they are quite cumbersome to evaluate, and we consider 
asymptotic simplifications. This may run against the intent of getting a criterion 
for small amounts of data, but the asymptotic estimates appear to be good enough, 
and the resulting criterion for the short data sequences created is still superior 
among the competing criteria such as the BIC [201 ] . which is equivalent with a 
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crude asymptotic version of the MDL criterion [l5(, and a recently suggested one, 
KICC 21], or bias corrected Kullback-Leibler criterion. 

We describe below the NML model for AR and ARMA class of models, and 
discuss its optimality properties. We also derive in the Appendix the asymptotic 
form of the Fisher information matrix for the general ARMA class of models. 



2. Normalized maximum likelihood model 

We consider the ARMA model: 

n m 

(1) Vt + ^atVt-i =e t + 2j6j-e t _ 3 -, 

i=i j=i 

where e* is zero-mean white Gaussian noise of variance a 2 . The integers m,n are 
nonnegative, and all coefficients ai and bj are real-valued. We can equivalently write 

Vt = fj^y e *> where B (l) = 1 + b ^ + ■■■ + b m q-"\ A(q) = 1 + mq- 1 + ■■■ + 

a n q~ n , and q" 1 is the unit delay operator. We will use the notation ARMA(n,m) 
for the class of the normal density functions {f{y N \ 8)} defined by such processes, 
where 8 = (a±, . . . , a n , b\, . . . , b m , <7 2 ), the parameters ranging over a subset of 5R fc , 
where k = n + m + 1. Let 0(y N ) denote the maximum likelihood estimates of the 
parameters 8. 

In order to define the range of the parameters properly we need to consider 
another equivalent parametrization in terms of the roots of the two polynomials 

n m 

(2) H(l-9 i q- 1 )yt = l[(l-h j q- 1 )e u 

»=1 i=i 

together with the noise variance a 2 . We denote by gi the zeros of A(q) and by hj 
the zeros of B(q). There are no repeated poles or zeros nor pole-zero cancellations. 
We specify in the Appendix exactly the further restrictions on the type of the zeros 
but for now let the same symbol 8 denote the new parameters ranging over 9 C 3f fc . 
Consider the NML density function, [^.fli]. 

ft n x f(y N ;0(y N )) 

f(y ; n, m) = 



Ck,n 



where 



C k , n = / f(x N ;8(x N ))dx N 
Jx N :§(x N )en 

g(8;8)d8, 

en 

and g{6; 8) denotes the density function on the statistic 8 induced by f(y N ; 8). In the 
equation above, we use the identity f(x N ;8(x N ),6) = f(x N \8(x N );8)g(8(x N )]8), 
that is integrated first over x N at the point 6(x ) = 8 = 8 kept fixed, which gives 
unity, and then over 8. 

Under the main assumption that the convergence in distribution by the Cen- 
tral Limit Theorem applies to the ML estimates, the stochastic complexity, L(y ; 
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n, to) = In 1/ f(y ; n, to), is given by 

(3) L(y N ;n,m) = -]xif(y N ;6(y N )) + ^ln^+ln J\J(ff)\V*M + o(l), 

where O denotes the parameter space, and 3(9) is the Fisher information matrix 
[3]. The rate of convergence o(l) is determined by the convergence of the ML 
estimates to the normal density function. 

To get a criterion for the structure in general we ought to add the code length 
needed to encode the structure, but here we take the simple case where the structure 
consists of a few first coefficients of the ARMA model, whose code length is much 
shorter than the stochastic complexity and ignored. (If k is not small, we can use 
the estimate L(k) = Ink + 21nln/c.) 

The NML model has the following two optimality properties, which justify its 
name: 

(1) It is the unique solution / = g = q to the following maxmin problem 

. P , f(y N ;Hv N )) 

max mm E g log r -^- , 

where g and q range over any sets that include /. Notice that the logarithm of the 
ratio is the difference between the ideal code length log 1 jq and the unattainable 
lower bound for any code length in the ARMA class. 

(2) If the data generating distribution g is restricted to the ARMA class, the 
mean of the stochastic complexity with respect to the model 9 cannot be beaten 
by any model what so ever, except for 9 in a set whose volume goes to zero as N 
grows. 



3. Linear regression with constant regressor matrix 

Before discussing the AR models we illustrate the stochastic complexity criterion 
for linear quadratic regression with constant Fisher information by comparing it 
with the BIC and the KICC criteria in a simple polynomial fitting problem for 
small amounts of data. 

For linear regression with a constant regressor matrix X = {xu} the stochastic 
complexity criterion takes the form, 19], 

min{(A - k) In f + k In R + (N - k - 1) In — (k - 1) In k}. 

ier n — k 

The index 7 = ii, . . . , ifc, consists of the indices of the rows Xi of the k x n regressor 
matrix included in the linear combination 

Vt = ^2 PiXit +e t , t=l,...,N, 

f is the minimized squared error per symbol, and R = i/3 T X 7 X^/3, where X 7 is 
the k x n submatrix of X consisting of the retained rows. 

Notice that there are no hyper parameters defining the range of the parameters 
/3j and r. They have been renormalized away. 
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Table 1 

Order estimation of the polynomial model in Example 1. The true order is k = 3. For each 
criterion, the probability of correct estimation of the order is computed from 10 5 runs. Also 
shown is the probability of overestimation of the polynomial order (4 < k < 10). The probability 
of underestimation (0 < k < 2) is almost zero for all analyzed criteria. The best result for each 
sample size N is represented with bold font. 



Order Criterion Sample size(iV) 







25 


30 


40 


50 


60 


70 


80 


90 


100 


k = k 


NML 


0.94 


0.95 


0.96 


0.97 


0.97 


0.97 


0.98 


0.98 


0.98 




BIG 


0.79 


0.84 


0.89 


0.91 


0.93 


0.94 


0.95 


0.95 


0.95 




KICC 


0.93 


0.92 


0.91 


0.91 


0.90 


0.90 


0.90 


0.90 


0.89 


k> k 


NML 


0.06 


0.05 


0.04 


0.03 


0.03 


0.03 


0.02 


0.02 


0.02 




BIC 


0.21 


0.16 


0.11 


0.09 


0.07 


0.06 


0.05 


0.05 


0.05 




KICC 


0.07 


0.08 


0.09 


0.09 


0.10 


0.10 


0.10 


0.10 


0.11 



Example 1. We discuss an example of polynomial fitting considered in [2l| to in- 
vestigate the performances of a model selection criterion called KICC. It is obtained 
by an application of a bias correction to KIC (Kullback Information Criterion) , [6| , 
and it is recommended to be used in linear regression problems when the sample 
size is small. The underlying signal is generated by a third-order polynomial model 
y — x 3 — 0.5x 2 — 5a; — 1.5, where the points x\, . . . , x^ are chosen to be uniformly 
distributed in [—3, 3]. The measurements yi, • • ■ ,yn are obtained by addition to %ii 
zero-mean white Gaussian noise, whose variance is selected such that the signal-to- 
noise ratio is SNR=10 dB. For each number of data points N, between 25 and 100, 
10 5 different realizations are produced, to which polynomials of degree 0, 1, . . . , 10 
are fitted with the least squares method. 

The estimates of the order of the polynomial obtained with the NML, BIC and 
KICC criteria are in Table [fl We have restricted our investigations only to these 
three criteria, because in [2l| KICC was shown to outperform other six estimation 
criteria for TV = 25 and N = 30. We see in the table that NML criterion performs 
better than BIC and KICC in all the cases studied. Observe that the number of 
correct estimations produced by KICC generally declines when more measurements 
are available, while the BIC and the NML results improve with increasing N. For 
example, KICC compares favorable with BIC for N — 25, but the situation is 
reversed for N = 100. 



4. AR models 

The likelihood density function for an AR model is given by 



where we put y t — for t < 1. The maximized likelihood is ... , where a 2 

N 



is the minimized sum per symbol a 2 = — ^~](yt + o,\Vt-i + ' ' ' + a-nVt-n) 2 ■ The 

t=i 

NML criterion ([3]) has now the expression 

(4) L(y N ; n) = | ln(2^7 2 ) + ^±1 In ^ + In / {3(6)1^63 + o(l). 
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The Fisher information matrix is given by 



R 







R z 



'0 



ri 
ro 



r„_i r„_ 2 



l/(2a 4 ) 

• • • r n _i 

• • • r n _ 2 



where 



and ri = denote the covariances of the process z t = yt/cr 0, E^. Applying 

the formula in [l2j for the parameters transformation and the well-known Vieta's 
formulae, it is easy to calculate the Fisher information matrix for the parameter 
set given by the model poles g = (gi, g2, ■ ■ ■ , g n ) and the noise variance a 2 . 

Remark in (j4|) that the integral term makes the most important difference be- 
tween the expression for the stochastic complexity and the BIC criterion. The inte- 
gral has a lot of structural information which BIC lacks, and it generally increases 
with n, because the determinant increases. 

We note that the contribution of the a 2 to the integral is decoupled by the 
contribution of the other parameters. Consequently we ignore for all the AR models 
the contribution of a 2 because we do not have any "natural" finite limits for the 
range of a 2 . The constrain to have a stable model restricts the domain of the 
magnitudes of the poles to be a hypercube. 

Apart from the AR(1) case for which the integral in (J3J) can be found in a closed 
form, f ^ — J — =dg = 7r, the evaluation of the integral will be done by the Monte 

Carlo technique. To be more precise we use Sobol' sequences [14{ to perform the 
Monte Carlo integration for AR(n) models with 1 < n < 6. For these values all 
poles are complex if n is even, and exactly one pole is real-valued if n is odd, which 
can be taken advantage of in calculating the form of the information matrix. 

Our Matlab implementation is based on the algorithm described on p. 312 in 
14 1 and the code publicly available at [l[ . We perform the Monte Carlo integration 



Table 2 

Monte Carlo results for the integral term in the stochastic complexity formula 0) for 
autoregressive models. For the AR(1) model the fractional error is reported. 



M 


hi 


Fractional error or A 


AR(1) 






10 5 


3.131956 


0.003067 


10 6 


3.138952 


0.000840 


AR(2) 


- pure complex poles 




10 6 


42.06 




10 7 


47.41 


0.11 


AR(3) 


- one real-valued pole 




10 6 


122.67 




10 7 


137.73 


0.11 


AR(4) 


- pure complex poles 




10 6 


1069.66 




10 7 


1358.84 


0.21 


AR(5) 


- one real-valued pole 




10 6 


3733.59 




10 7 


8307.55 


0.55 


AR(6) 


- pure complex poles 




10 6 


23164.39 




10 7 


35981.48 


0.36 
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for various AR models with M integration points. But first, to test the accuracy 
we use the known result for the AR(1) model. Table [2] shows the fractional error 
obtained when M = 10 5 and M = 10 6 . For models with larger order, we report 
the value A = |J 10 7 — ho e \/Iw 7 : where Im denotes the Monte Carlo evaluation of 
/ e |J(<7)| 1 / 2 d0 calculated from M integration points. We show in Table[2]the results 
on A since it is known for Monte Carlo integration with Sobol' sequences that the 
fractional error decreases with the number of samples as QnM) n /M flij ]. 

Example 2. We evaluate the capabilities of NML, BIG and KICC criteria for es- 
timating the order of AR models. The NML criterion is calculated with formula 
(|4jl , where the value of the integral term for n > 1 is the one from Table [2] com- 
puted with M = 10 7 integration points. We extend our experimental framework 
by considering another information theoretic criterion, namely the predictive least 
squares criterion PLS, [HI ]. 

Figure [1] outlines the simulation procedure used in Example 2, and the estimation 
results are shown in Tables [30 

Note that the evaluation of the various criteria for order estimation requires the 
estimate of noise variance for each order between one and six. Moreover, for the 
PLS criterion the computation of the prediction errors must be performed for each 
order and for each sample point. To reduce the computational burden, we resort to 
the fast implementation of the prewindowed estimation method based on predictive 
lattice filters [3, [H- 

Observe in Table [3] that the NML criterion compares favorably with all the 
other criteria when the sample size is at least 50. For the smallest amount of data 



For the model order n S {1, 2, 3}, 

For each order estimation criterion C and for each sample size N, 
N G {25, 50, 100, 200}, initialize with zero two counters: 
Af^j c for correct estimations and A/]y c for over-estimations. 
Repeat the following steps 1000 times: 

Generate independently the entries of V p as outcomes of IA [(0.8, 1)], 
and the entries of as outcomes of hi [(0, 7r)]. 
If n is odd, generate the unique entry of V p 
according to U [(-1, -0.8) (J(0.8, 1)] . 
Repeat the following steps 1000 times: 

Simulate a time series with 300 entries for the AR(n) process 
whose poles are given by V^., Vq, V p . 
Use null initial conditions and cr 2 = 1. 
Discard the first 100 entries of the time series and 
dub 2 the vector formed with the rest of 200 measurements. 
For each sample size N £ {25, 50, 100, 200}, 

Choose y N = [21, . . . , 2]\r] T . Apply each criterion C 
to estimate the model order tijvc from y N data, 
under the hypothesis njv,c £ {!> • • • j 6}. 
If n N,C = n > then increment J\f^ c . 
If fiN,C > n i then increment c . 

End 

End 

End 

Calculate the probability of correct estimation p c N c = A/"^ c /10 6 , 

and the probability of over-estimation p° N c = ^/lO 6 for the model order. 

End 

Fig 1. The simulation procedure applied in Example 2. The notation U[-] is used for the uniform 
distribution. 
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the asymptotic calculation of the Fisher information does not seem to be accurate 
enough. In most of the cases BIC is ranked the second after the NML, and the results 
of KICC do not improve when the sample size N is increased. For all criteria the 
performances decline for the larger values of the model order, which is clear because 
there is more to learn. Notice the moderate performances of the PLS criterion. We 
mention that another comparative study 0] also reports the moderate capabilities 
of PLS on estimating the order of AR models. This is to be expected since the 
PLS criterion is based on the estimates of the parameters which are shaky for small 
amounts of data. 



5. ARMA models 



The density function for ARMA models, (Q}, depends on how the initial values of 
y are related to the inputs e. A simple formula results if we put yi — a = for 
i < 0. Then the linear spaces spanned by y l and e are the same. Let ijt+i\t be the 
orthogonal projection of yt+i on the space spanned by y*. We have the recursion 

m n 

(5) y t +i\t = ^2 b *(yt-*+i - vt-i+i\t-i) - ^2 a »y*-i+i' 

i=i i=i 

where y^ = 0. With more general initial conditions the coefficients hi in © will 
depend on t; see for instance [17] . The likelihood function of the model is then 

(6) f(v N -9 a 2 ) = -e-I^Z^i^-y*'*- 1 ) 



I 1 N 

The maximized likelihood is -; „ % AT< „ , where <x 2 = min — > (y t 

(27re(7 ) / ai ,...,a n M,-,bm N 



t=i 



^ t |t_i) 2 . The NML criterion $2$ is then given by 

(7) L(y N ; n, m) = % H2ne^) + - + ™ + 1 In f + In / IJ^I^^ + o( i), 
2 2 27T J e 

In Appendix we elaborate on the computation of the integral term for the NML 
criterion, and the results are applied to the selection for ARMA models in the 
following example. 



Table 3 

Example 2 - the probability of correct estimation of the AR order. The best result for each 
sample size N is represented with bold font. 



AR model order 


Criterion 




Sample 


size (AT) 




25 


50 


100 


200 


n=l 


NML 


0.99 


0.99 


1.00 


1.00 




BIC 


0.93 


0.95 


0.97 


0.98 




KICC 


0.95 


0.93 


0.91 


0.90 




PLS 


0.89 


0.92 


0.95 


0.97 


n = 2 


NML 


0.72 


0.85 


0.87 


0.88 




BIC 


0.79 


0.85 


0.87 


0.87 




KICC 


0.82 


0.83 


0.80 


0.78 




PLS 


0.49 


0.59 


0.66 


0.71 




NML 


0.49 


0.74 


0.83 


0.84 


n = 3 


BIC 


0.52 


0.71 


0.78 


0.79 




KICC 


0.51 


0.71 


0.73 


0.69 




PLS 


0.26 


0.39 


0.47 


0.53 
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Table 4 

Example 2 - the probability to over- estimation of the order of AR models. The smallest 
overestimation probability for each sample size N is represented with bold font. 



AR model order 


Criterion 




Sample 


size yrs ) 




25 


50 


100 


200 


n = 1 


NML 


0.01 


0.01 


0.00 


0.00 




BIC 


0.07 


0.05 


0.03 


0.02 




KICC 


0.05 


0.07 


0.09 


0.10 




PLS 


0.11 


0.08 


0.05 


0.03 


n = 2 


NML 


0.07 


0.09 


0.11 


0.12 




BIC 


0.10 


0.11 


0.12 


0.13 




KICC 


0.06 


0.14 


0.20 


0.22 




PLS 


0.20 


0.19 


0.17 


0.15 


n = 3 


NML 


0.01 


0.03 


0.06 


0.12 




BIC 


0.07 


0.09 


0.12 


0.18 




KICC 


0.03 


0.10 


0.20 


0.29 




PLS 


0.21 


0.22 


0.23 


0.23 



Table 5 

Results of model selection for the ARMA models in Example 3. The counts indicate for 1000 
runs the number of times the structure of the model was correctly estimated by each criterion, 
from the set {ARMA(n, m) : n,m > l,n + m < 6}. The best result for each sample size N is 

represented with bold font. 



ARMA model 


Criterion 




Sampl 


e size 


(AO 




25 


50 


100 


200 


400 


n = 1, m = 1 


NML 


700 


812 


917 


962 


989 


ai = -0.5 


BIC 


638 


776 


894 


957 


983 


fei = 0.8 


KICC 


717 


710 


758 


745 


756 


n = 2, m = 1 


NML 


626 


821 


960 


991 


994 


ai = 0.64, a 2 = 0.7 


BIC 


532 


740 


898 


961 


978 


fei = 0.8 


KICC 


586 


727 


810 


846 


849 


n = 1, m = 1 


NML 


851 


887 


918 


931 


961 


ai = 0.3 


BIC 


766 


804 


856 


903 


942 


fei = 0.5 


KICC 


860 


764 


654 


614 


577 



Example 3. We calculate the structure of ARMA models for data generated by 
three different processes, which also were used in [11] . For each model, the true 
structure and the coefficients are given in Table El where we show the estimation 
results for 1000 runs. In all experiments we have chosen the variance of the zero- 
mean white Gaussian noise to be a 2 = 1. We mention that, similarly with the 
experiments on the autoregressive models each data set y N was obtained after 
discarding the first 100 generated measurements. This is to eliminate the effect 
of the initial conditions. There exist different methods for estimation of ARMA 
models. We selected the one implemented in Matlab as armax function by Ljung, 



which is well described in his book [13 1 



Appendix: The asymptotic Fisher information matrix 

We focus on the computation of the integral term in equation ([7]). The model is 
assumed to be stable and minimum phase, which means that in ([2j the roots for 
both B{q) and A{q) are inside the open unit disc. Assume that n\ zeros of A(q) 
and mi zeros of B(q) are real-valued. Then we have the inequalities < ri\ < n 
and < mi < m. Because all coefficients of A{q) and B(q) are real-valued, the 
pure complex poles and zeros occur in complex conjugate pairs, and consequently 
the differences n — n\ and m — mi are both even integers. For the pure complex 
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poles and zeros we apply the parameterization in 0]: 

9t+i =9i = \ge\ exp(-# g J, (j) ge G (0,tt), I G {ni + l,m + 3, . . . ,n - 1}, 
ft-^+i = h}= \ht\ exp(-i^/ l/ ), ^ G (0,7r), ^ G {mi + 1, mi + 3, . . . , to - 1}, 

where the symbol * denotes the complex conjugate. The entries of the parameter 
vector 9 are given by: 

9 = (.9i j ■ ■ ■ )9m, 

\9n 1 + l\,4>g ni + n ■ ■ ■ ) ISn-l|) ^fl>„-n 

/ii , . . . , h mi , 

l^mi+ll) ^hmi+i ' • • • i \h m -l\,4>h m ^ 1 , 
a 2 ). 

For the sake of clarity we define the subsets of indices for the 9 parameters: 





= {i,2,...,m} 






= {ri\ + 1, n\ + 3, . 


..,71-1} 




= {ni + 2,«i + 4, . 


..77} 


V 






z P 


= {n + l,n + 2,... 


77 + TOl} 




= {n + toi + l,nH 


- 777 1 + 3, . . . , 77 + 777 — 1} 




= {n + toi + 2, n -) 


- 777 1 + 4, . . . , 77 + 777} 


z 


= z p {Jz,{Jz^ 





Based on (|6]) we use the following asymptotic expression for the log-likelihood func- 
tion of the observations y±, . . . ,yisr, [2], 0: 



£ = -^> e 2 -— lncr 



2ct 2 

t=l 



J 1 constant. 



For all u, v G {1, . . . , m + 77 + 1}, the (u, v) entry of the Fisher information matrix 

is given by the formula ^M'- Juv = — hm — E[— — - — ]. Applying the results in 

. ' N ~* co o9 u d9 v 

[2| and [9(, we obtain in a straightforward manner: 

= 1/(2*% 

Ju,n+m+l = Jn+m+l,v = Vtt, V G {1, . . . , 77 + m}. 

1 <9£ dC 

For the following calculations we use the identity J uv — lim — ET— — — — 1. Con- 

n^oo N o9 u o9 v 

sider first the case u,v G Pp. Simple calculations lead to 

Q -1 00 
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Ju,v — 



N(T 4 



-E 



N 



N 



x t=i p=i 

JV oo 



vs— 1 r— 1 



(8) 



jVcr 4 



(=i p=i 



1 



1 — 9 U 9 V 



S U S V 



We conclude for u,v e Vp\jZ p that J u ,v — 2P2T' wnere 

1 — 9 U 9 V 

Sn. — 



-1,116? 

i, uez 

Formula ([5]) was deduced in Q for the case when all the poles and the zeros of the 
ARMA(n,m) model are real-valued. We evaluate next the entry (u, v) of the Fisher 
information matrix for u G V p (J Z p and v G "P^ (J V<p [J Z^ (J Z^ . It is not difficult 
to prove that 

Q(f = E d v^s-r Vs G {1, . . . , TV}, 
u r— 1 

where the coefficients du, r are real- valued, Therefore 

E e *E d 



J - Su F 



Ncr 4 



E^E^-, 

L \t=l p=l J 



v.r^s—r 



£=1 p=l 



p=l 



The following closed form expressions of d VtP are given in [H[ for u G V p (J -Z M : 



I 2<S„ cos ( 

A 2^ ®v s i n (p@v+l ) cos — 



u+ll 



i((p-l)8„ + i)8„ 



u sin p^+i 



P = i 



The equations above lead to 

S U S V cos v +i 



J — 2- 



E(M«) p Bin(p^ 0+ i) 



P =i 



CXJ 

2-^-E^^) Psin (^ 



sm 
2o u o„ 



p=l 

cos — 9 U 9 V 



i-2e u e v cosd v+ i + e 2 j 2 v ' 

for u eV p and v e Vp\jZ p . Similarly for v E V^IJ 2$ and p > 1, we have, 0], 

d VtP = -2S V 9^_ 1 sin(p9 v ), 
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and it is easy to prove that 



7„_i suit 



1 — 28 u 6 v _i cos ( 



32/32 ' 
'u a v-l 



When u, v e Vp, \] [J 1J Z^, we can apply the formulas given in Q for the 
computation of J UjV in case all the poles and the zeros are purely complex. 

Analyzing the sign of the product S U S V , we find that the matrix 3(9) can be 



re- written more compactly as 3(9) 
trix C is n x m. The identity 



G 

-C 1 



-c 

H 



G 


c 




1 G 


C 1 


C T 


H 




1 C T 


H 1 



, where the size of the block ma- 
leads to the conclusion that 



Je |J(#)| 1/2 d# has the same value for the models ARMA(n,m), ARMA(n+m,0), 
ARMA(0,n+m). A similar conclusion was drawn in 4] for the particular case when 
all the poles and the zeros are real-valued. 



References 

[1] http://www2.math.uic.edu/~hanson/mcs507/cp4f04.html 

[2] Astrom, K. (1967). On the achievable accuracy in identification problems. In 

Preprints of the IFAC Symposium Identification in Automatic Control Systems. 

Prague, Czechoslovakia, 9 pp. 
[3] Barron, A., Rissanen, J. and Yu, B. (1998). The minimum description 

length principle in coding and modeling. IEEE Trans. Inf. Theory 44 2743- 

2760. MR1658898 

[4] Box, G. and Jenkins, G. (1970). Time Series Analysis, Forecasting and 
Control. Holden-Day, Inc. MR0272138I 

[5] Bruzzone, S. and Kaveh, M. (1984). Information tradeoffs in using the sam- 
ple autocorrelation function in ARMA parameter estimation. IEEE Trans, 
on Acoustics, Speech and Signal Processing ASSP-32 (4, Aug.) 701-715. 
IMR076339T1 

[6] Cavanaugh, J. (1999). A large-sample model selection criterion based 
on Kullback's symmetric divergence. Statist. Probability Lett. 42 333-343. 
IMR1707T781 

[7] DJURIC, P. and Kay, S. (1992). Order selection of autoregressive models. 

IEEE Trans. Signal. Proces. 40 2829-2833. 
[8] Friedlander, B. (1982). Lattice filters for adaptive processing. Proc. 

IEEE 70 829-868. 

[9] Friedlander, B. (1984). On the computation of the Cramer- Rao bound for 

ARMA parameter estimation. IEEE Trans, on Acoustics, Speech and Signal 

Processing ASSP-32 (4, Aug.) 721-727. IMR0763392I 
[10] Friedlander, B. and Porat, B. (1989). The exact Cramer-Rao bound 

for Gaussian autoregressive processes. IEEE Tr. on Aerospace and Electronic 

Systems AES-25 3-8. IMR0994344I 
[11] Hannan, E. and Rissanen, J. (1982). Recursive estimation of mixed 

autoregressive-moving average order. Biometrika 69 (1) 81-94. MR0655673 
[12] Kay, S. (1993). Fundamentals of Statistical Signal Processing: Estimation 

Theory. Prentice-Hall, Inc. 
[13] Ljung, L. (1999). System Identification: Theory for the User, 2nd ed. Prentice 

Hall, Upper Saddle River, NJ. 
[14] Press, W., Teukolsky, S., Vetterling, W. and Flannery, B. (1992). 

Numerical Recipies in C. The Art of Scientific Computing, 2nd ed. Cambridge 

University Press. IMR12011591 



Estimation of AR and ARMA models by stochastic complexity 



59 



[15] RiSSANEN, J. (1978). Modeling by shortest data description. Automatica 14 
465-471. 

[16] RiSSANEN, J. (1986). Order estimation by acumulated prediction errors. J. 
Appl. Prob. 23 A 55-61. MR0 803162I 

[17] RiSSANEN, J. (1989). Stochastic Complexity in Statistical Inquiry. World Sci- 
entific Publ. Co., River Edge, NJ, 175 pp. IMR10825561 

[18] RiSSANEN, J. (1996). Fisher information and stochastic complexity. IEEE 
Trans. Inf. Theory 42 (1, Jan.) 40-47. MR13753271 

[19] RiSSANEN, J. (2000). MDL denoising. IEEE Trans. Inf. Theory 46 (7, Nov.) 
2537-2543. 

[20] SCHWARZ, G. (1978). Estimating the dimension of the model. Ann. Stat. 6 
461-464. IMR0468014I 

[21] Seghouane, A.-K. AND Bekara, M. (2004). A small sample model selec- 
tion criterion based on Kullback's symmetric divergence. IEEE Trans. Signal. 
Proces. 52 3314-3323. IMR2107913I 

[22] Wax, M. (1988). Order selection for AR models by predictive least squares. 
IEEE Trans, on Acoustics, Speech and Signal Processing 36 581-588. 



